COMPARISON AND ANTI-CONCENTRATION BOUNDS 
FOR MAXIMA OF GAUSSIAN RANDOM VECTORS 



VICTOR CHERNOZHUKOV, DENIS CHET VERIKO V , AND KENGO KATO 

Abstract. Slepian and Sudakov-Fernique type inequalities, which com- 
pare expectations of maxima of Gaussian random vectors under certain 
restrictions on the covariance matrices, play an important role in proba- 
bility theory, especially in empirical process and extreme value theories. 
Here we give explicit comparisons of expectations of smooth functions 
and distribution functions of maxima of Gaussian random vectors with- 
out any restriction on the covariance matrices. We also establish an 
anti-concentration inequality for maxima of Gaussian random vectors, 
which derives a useful upper bound on the Levy concentration function 
for the maximum of (not necessarily independent) Gaussian random 
variables. The bound is universal and applies to vectors with arbitrary 
covariance matrices. This anti-concentration inequality plays a crucial 
role in establishing bounds on the Kolmogorov distance between maxima 
of Gaussian random vectors. These results have immediate applications 
in mathematical statistics. As an example of application, we establish a 
conditional multiplier central limit theorem for maxima of sums of inde- 
pendent random vectors where the dimension of the vectors is possibly 
much larger than the sample size. 



1. Introduction 

We derive a bound on the difference in expectations of smooth functions 
of maxima of finite dimensional Gaussian random vectors. We also derive a 
bound on the Kolmogorov distance between distributions of these maxima. 
The key property of these bounds is that they depend on the dimension p of 
Gaussian random vectors only through log p, and on the maximum difference 
between the covariance matrices of the vectors. These results extend and 
complement the work of [7| that derived an explicit Sudakov-Fernique type 
bound on the difference of expectations of maxima of Gaussian random 
vectors. See also [lj], Chapter 2. As an application, we establish a conditional 
multiplier central limit theorem for maxima of sums of independent random 
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vectors where the dimension of the vectors is possibly much larger than the 
sample size. In all these results, we allow for arbitrary covariance structures 
between the coordinates in random vectors, which is plausible especially in 
applications to high-dimensional statistics. We stress that the derivation of 
bounds on the Kolmogorov distance is by no means trivial and relies on the 
new anti- concentration inequality for maxima of Gaussian random vectors, 
which is another main result of this paper (see Comment [3] for what anti- 
concentration inequalities here precisely refer to and how they differ from the 
concentration inequalities). These anti-concentration bounds are non-trivial 
in the following sense: (i) they apply to every dimension p and are explicit 
in the effect of the dimension p, (ii) they allow for arbitrary covariance 
structures between the coordinates in Gaussian random vectors, and (iii) 
they are sharp in the sense that there is an example for which the bound 
is tight up to a dimension independent constant. We note that these anti- 
concentration bounds are sharper than those that result from the application 
of the universal reverse isoperimetric inequality of j^J (see Comment [5]). This 
happens due to the special structure of the sets of interest. 

Comparison inequalities for Gaussian random vectors play an important 
role in probability theory, especially in empirical process and extreme value 
theories. We refer the reader to Q, 0, [H, Gt 0> Q. and frfa 
for standard references on this topic. The anti-concentration phenomenon 
has attracted considerable interest in the context of random matrix th eory 
and the Littlewood-Offord problem in number theory. See, e.g., [T^j, [2of . 
and 26[] who remarked that "concentration is better understood than anti- 
concentration". Those papers were concerned with the anti-concentration in 
the Euclidean norm for sums of independent random vectors, and the topic 
and the proof technique here are substantially different from theirs. 

Either of the comparison or anti-concentration bounds derived in the pa- 
per have many immediate statistical applications, especially in the context 
of high-dimensional statistical inference, where the dimension p of vectors of 
interest is much larger than the sample size (see 0] for a textbook treatment 
of the recent developments of high-dimensional statistics). In particular, 
the results established here are helpful in deriving an invariance principle 
for sums of high-dimensional random vectors, and also in establishing the 
validity of the multiplier bootstrap for inference in practice. We refer the 
reader to a companion paper [9(, where the results established here are ap- 
plied in several important statistical problems, particularly the analysis of 
Dantzig selector of @] in the non-Gaussian setting. 

After the initial submission, we have become aware of the work [It]], which 
derives bounds on the density function of the maximum of a Gaussian ran- 



dom vector [see 1 171. Proposition 3.12] under positive covariances restriction. 



This is related to but different from our anti-concentration bounds. The 
crucial assumption in 17j]'s Proposition 3.12 is positivity of all the covari- 
ances between the coordinates in the Gaussian random vector, which does 
not hold in our targeted applications in high-dimensional statistics, e.g., 
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analysis of Danzig selector. Moreover, [Ill's upper bound on the density 
depends on the inverse of the lower bound on the covariances - and hence, 
e.g., if there are two independent coordinates in the Gaussian random vec- 
tor, then the upper bound becomes infinite. Our anti-concentration bounds 
do not require such positivity (or other) assumptions on covariances and 
hence are not implied by results [13]. Moreover, the proof technique used 
here is substantially different from that of [ItJ based on Malliavin calculus. 

The rest of the paper is organized as follows. In Section we present 
comparison bounds for Gaussian random vectors and its application, namely 
the conditional multiplier central limit theorem. In Section [31 we present 
anti-concentration bounds for maxima of Gaussian random vectors. In Sec- 
tions H] and [5j we give proofs of the theorems in Sections [2] and El Appendix 
contains a proof of a technical lemma. 

Notation. Denote by (f^J 7 , P) an underlying probability space. For 
a,i)6R, we write a + = max{0, a} and a V b = max{a, b}. Let l(-) denote 
the indicator function. The transpose of a vector z is denoted by z T . For a 
function g : R — > R, we use the notation ||g||oo = sup 2gK |g(z)|. 

2. Comparison Bounds and Multiplier Bootstrap 

2.1. Comparison bounds. Let X = (X±, . . . , X p ) T and Y = (Yi, . . . , Y p ) T 
be centered Gaussian random vectors in MP with covariance matrices Y* x = 
{&jk)l<j,k<p an d S y = (ojk)i<j,k<p, respectively. The purpose of this sec- 
tion is to give error bounds on the difference of the expectations of smooth 
functions and the distribution functions of 

max Xa and max Y* 

i<j<P J i<j<P J 



in terms of p and 



A := max \af k — aj k \ 

l<j,k<p J J 



The problem of comparing distributions of maxima is of intrinsic diffi- 
culty since the maximum function z = (z±, . . . , z p ) T i— > maxi<,<p Zj is non- 
differentiable. To circumvent the problem, we use a smooth approximation 
of the maximum function. For z = (z\, . . . , z p ) T G R p , consider the function: 

F p (z) :=/3- x tog feexptfzj] 

which approximates the maximum function, where j3 > is the smoothing 
parameter that controls the level of approximation (we call this function the 
"smooth max function"). Indeed, an elementary calculation shows that for 
every z G R p , 

< Fr(z) - max zj < log p. (1) 

i<i<p 

This smooth max function arises in the definition of "free energy" in spin 



glasses. See, e.g., 24] and 18]. Here is the first theorem of this section. 
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Theorem 1 (Comparison bounds for smooth functions). For every g £ 
C 2 (R) with \\g'\\oo V H^'Hoo < oo and every (3 > 0, 

\E[g(Fp(X))} - E[g(F^Y))]\ < (\\g"\U2 + /3||</|U)A, 
and hence 

|E[g(max X^-EMmax Yj)]\ < (H/IU/2 + /3|| 5 '|| 00 )A + 2/3" 1 Ib'IU logp. 
i<j<p i<?<p 

Proof. See Section HI □ 
Comment 1. Minimizing the second bound with respect to f3 > 0, we have 



|E[ 5 ( max X,-)] - E[s( max < HslooA/2 + 2|| 5 / || ooX /2Alogp. 

i<i<p i<?<p 

This result extends the work of 0], which derived the following Sudakov- 
Fernique type bound on the difference of the expectations of the Gaussian 
maxima: 



|E[ max Xj] - E[ max YA\ < 2A/2Alogp. 

i<i<p i<i<p 

Theorem [1] is not applicable to functions of the form g{z) = l{z < x) and 
hence does not directly lead to a bound on the Kolmogorov distance between 
maxi<j<pXj and maxi<j< p Yj (recall that the Kolmogorov distance between 
(the distributions) of two real valued random variables £ and rj is defined by 
su Px6R ^ x ) ~ F^ 7 ? ^ X )D- Nevertheless, we have the following bound 
on the Kolmogorov distance. 

Theorem 2 (Comparison of distributions). Suppose that p > 2 and ajj > 
for all 1 < j < p. Then 

sup |P( max Xj <x)-P( max Yj < x)\ < CA 1/3 (1 V log(p/A)) 2/3 , (2) 

xGR 1 <3<P l<j<P 

where C > depends only on mini<j< p ajj and m.axi<j<pO~Jj (the right side 
is understood to be when A = 0). 

Proof. See Section HI □ 

Deriving a bound on the Kolmogorov distance between maxi<j< p Xj and 
max KjXpYj- from Theorem [T] is not a trivial issue and this step relies on the 
anti- concentration inequality for maxima of (not necessarily independent) 
Gaussian random variables, which we will study in Section [3l Interestingly, 
the proof of Theorem[2]is substantially different from the ("textbook") proof 
of classical Slepian's inequality. The simplest form of Slepian's inequality 
states that 

P( max Xj < x) < P( max Yj < x), Vx G R, 
i<i<p i<i<p 
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whenever = ajj and a^ k < aj k for all 1 < j, k < p. This inequality is 
immediately deduced from the following expression: 



P( max Xj < x) — P( max Yj < x) 

i<j<P J i<j<P J 

where = &Jj,l < Vj < p, is assumed. Here ft denotes the density 
function of N(0, tY< x + (1 — t)T, Y ). See [13], page 82, for this expression. 
The expression ([3]) is of importance and indeed a source of many interesting 
probabilistic results (see, e.g., [jjj] and (13] for recent related works). It 
is not clear (or at least non-trivial), however, whether a bound similar in 
nature to Theorem [2] can be deduced from the expression ([3|) when there 
is no restriction on the covariance matrices except for the condition that 
afj = djj , 1 < Vj < p, and here we take the different route. 

The key features of Theorem [2] are: (i) the bound on the Kolmogorov 
distance between the maxima of Gaussian random vectors in M p depends 
on the dimension p only through logp and the maximum difference of the 
covariance matrices A, and (ii) it allows for arbitrary covariance matrices for 
X and Y (except for the nondegeneracy condition that ajj > 0, 1 < Vj < p). 
These features have an important implication to statistical applications, as 
discussed below. 



2.2. Conditional multiplier central limit theorem. Consider the fol- 
lowing problem. Suppose that n independent centered random vectors in 
M. p of observations Z%, . . . , Z n are given. Here Z\, . . . , Z n are generally non- 
Gaussian, and the dimension p is allowed to increase with n (i.e., the case 
where p = p n — > oo as n — > oo is allowed). We suppress the possible depen- 
dence of p on n for the notational convenience. Suppose that each Zi has a 
finite covariance matrix ~E[ZiZf\. Consider the following normalized sum: 

T 1 n 

Sn := {Sn.li • • • i Sn,p) = 7= / J Zi. 

V n , 
v i=l 

The problem here is to approximate the distribution of maxi<j< p S n j. 

Statistics of this form arise frequently in modern statistical applications. 
The exact distribution of maxi<j< p S n j is generally unknown. An intuitive 
idea to approximate the distribution of maxi<j< p S n j is to use the Gaussian 
approximation. Let V\, . . . , V n be independent Gaussian random vectors in 
W such that V- ~ N(0, E[ZiZf ]), and define 

1 - 

T n := (T Uj i, T n>p ) :=—Y^Vi^ N(0, n^YH^ZiZj]). 
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It is expected that the distribution of maxi<j< p T ni j is close to that of 
maxi<j<p S n j in the following sense: 

sup |P( max T n j < x) — P( max S n j < xc)| — > 0, n — > oo. (4) 
xem i<i<p ' i<i<p 

When p is fixed, ([4]) will follow from the classical Lindeb erg- Feller central 

limit theorem, subject to the Lindeberg conditions. The recent paper by [§] 

established conditions under which this Gaussian approximation (H]) holds 

even when p is comparable or much larger than n. For example, [a] proved 

that if Cl < n- 1 YZ=i E l Z ij] ^ c i and E [ ex P(l^il/Ci)] < 2 for all 1 < i < n 

and 1 < j < p for some < c\ < Cx, then @ holds as long as logp = o(?t, 1 / 7 ). 

The Gaussian approximation (J3|) is in itself an important step, but in the 

general case where the covariance matrix n~ 1 Y^=i ~^\ z i z \\ is unknown, it 

is not directly applicable for purposes of statistical inference. In such cases, 

the following multiplier bootstrap procedure will be useful. Let rji,...,rj n 

be independent standard Gaussian random variables independent of Z™ := 

{Z\, . . . , Z n }. Consider the following randomized sum: 

1 n 
v 1=1 

Since conditional on Z", 

SZ~N{U,n- l Y.U Z i Z D, 
it is natural to expect that the conditional distribution of maxi^xp S 7 ^ • is 
"close" to the distribution of maxi<j< p T n j and hence that of maxi<j< p S n j. 
Note here that the conditional distribution of Sn is completely known, which 
makes this distribution useful for purposes of statistical inference. The fol- 
lowing proposition makes this intuition rigorous. 

Proposition 1 (Conditional multiplier central limit theorem). Work with 
the setup as described above. Suppose that p > 2 and there are some con- 
stants < c\ < C\ such that c\ < n~ l Yli=l ~^\- Z ij] — Q for all 1 < j < p. 
Moreover, suppose that A := maxi^&^p jn^ 1 Y^i=i( z ij z ik — ^[ZijZik])\ = 
op((logp)~ 2 ). Then 

sup |P( max S2 • < x \ Z™) — P( max T n j < x)\ —> 0, as n — > oo. (5) 
xgr i<?'<p ,J i<i<p 

Here recall that p is allowed to increase with n. 
Proof. By Theorem [21 we have 

sup |P( max S v nj <x \ ZT)-P( max T nJ < x)\ = OjA^lVlogCp/A)) 2 / 3 }. 

The right side is op(l) as soon as A = op((logp) -2 ). □ 

We call this result a "conditional multiplier central limit theorem," where 
the terminology follows that in empirical process theory. See & Chapter 
2.9. The notable features of this proposition, which inherit from the features 
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of Theorem [2] discussed above, are: (i) ([5]) can hold even when p is much 
larger than n, and (ii) it allows for arbitrary covariance matrices for Z% 
(except for the mild scaling condition that c\ < n _1 Yli=i ^[^ij] — The 
second point is clearly desirable in statistical applications as the information 
on the true covariance structure is generally (but not always) unavailable. 
For the first point, we have the following estimate on E[A]. 



Lemma 1. Let p > 2. There exists a universal constant C > such that 
E[A] < C 



max {n-^UnZ&fN— + ( E [ max max 4]) 1/2 — 
i<i<p L v w V n v 'i<i<ni<j<p lJU n 

Proof. See Appendix. □ 



Hence with help of Lemma 2.2.2 in [25[, we can find various primitive 
conditions under which A = op((logp) -2 ) so that (O holds. Consider the 
following examples. 

Case (a): Suppose that E[exp(|.Zy|/Ci)] < 2 for all 1 < i < n and 
1 < j < p for some C\ > 0. In this case, it is not difficult to verify that 
A = op((logp)~ 2 ) as soon as logp = o(n 1 / 5 ). 

Case (b): Another type of Z^ which arises in regression applications 
is of the form Z^ = E{Xij where e$ are stochastic with E[ej] = and 
maxi<j<„ E[|ej| 413 ] = 0(1) for some q > 1, and X{j are non-stochastic (typ- 
ically, £j are "errors" and Xij are "regressors" ) . Suppose that Xij are nor- 
malized in such a way that n~ l Y^i=i x fj = 1> an d there are bounds S n > 1 
such that maxi<j< n maxi<j<p \xij\ < -B n , where we allow -B n — > oo. In this 
case, A = op((logp)~ 2 ) as soon as 

max{Bl(\ogp) 5 ,B^ 2q - 1 \logp) 6q / ( - 2q -^} = o(n), 

since maxi<j< p (n _1 Ya=1 ^i( £ i x ij) A ]) < B n maxi<j< n E[ef] = 0(B 2 ) and 
E[maxi<j< n maxi<j< p (ejX i: ,) 4 ] < fi 4 E[maxi<i< n e 4 ] = 0(n l / q BX). 

Importantly, in these examples, for ([5]) to hold, p can increase exponen- 
tially in some fractional power of n. 

3. Anti-concentration Bounds 

The following theorem provides bounds on the Levy concentration func- 
tion of the maximum of p Gaussian random variables, where the terminology 
is borrowed from |2C 



Definition 1 ([20|], Definition 3.1). The Levy concentration function of a 
real valued random variable £ is defined for e > as 

£(£,e) = su P P(|e-x| < e). 



Theorem 3 (Anti-concentration). Let X\,...,X P be (not necessarily in- 
dependent) centered Gaussian random variables with o 1 - := E[X|] > for 
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all 1 < j < p. Moreover, let a := miui<j<p o~j, a := maxi<j<p oa, and 
o p := E[mscxi<j< p (Xj /aj)]. 

(i) If the variances are all equal, namely a = a = a, then for every 
e > 0, 

£( max Xj, e) < 4e(a p + l)/c 

(ii) If the variances are not equal, namely a< a, then for every e > 0, 

£( max Xj,e) < Ce{a p + yfl V log(o/e)}, 
i<j'<p 

where C > depends only on a and a . 

The following simpler corollary is useful in applications. This corollary 
will be used in the proof of Theorem [2j 

Corollary 1. Let X±, . . . , X p be (not necessarily independent) centered Gauss- 
ian random variables with a 1 - := E[X?] > for all 1 < j < p. Let 
a := mini<j<p (jj and a := maxi<j<p (Xj . Then for every e > 0, 

£( max Xj,e) < Ce^/l V log(p/e), 

where C > depends only on a and a. When o~j are all equal, log(p/e) on 
the right side can be replaced by log p. 

Proof of CorollaryU^ Since Xj/crj ~ N(Q, 1), by a standard calculation, we 
have a p < y/2 log p. See, e.g., Proposition 1.1.3 of [24(. Hence the corollary 
follows from Theorem [3j □ 

Comment 2 (Anti-concentration vs. small ball probabilities). The problem 
of bounding the Levy concentration function £(maxi<j<p^j,e) is qualita- 
tively different from the problem of bounding P(maxi<j<p \Xj\ < x). For a 
survey on the latter problem, called the "small ball problem", we refer the 
reader to (l5| . 

Comment 3 (Concentration vs. anti-concentration). Concentration in- 
equalities refer to inequalities bounding P(|£ — x\ > e) for a random variable 
£ (typically x is the mean or median of £). See the monograph [l3T | for 
a study of the concentration of measure phenomenon. Anti- concentration 
inequalities in turn refer to reverse inequalities, i.e., inequalities bound- 
ing P(|£ — x\ < e). Theorem [3] provides anti-concentration inequalities 
for maxi<j<p Xj. [26( remarked that "concentration is better understood 
than anti-concentration". In the present case, the Gaussian concentration 
inequality (see [HI], Theorem 7.1) states that 

P(| max X j - Ef max JO > r) < 2e" r2/{2 ^ 2) , r > 0, 

where the mean can be replace by the median. This inequality is well known 
and dates back to d and [13]. To the best of our knowledge, however, the 
reverse inequalities in Theorem [3] were not known and are new. 
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Comment 4 (Anti-concentration for maximum of moduli, maxi<j< p |Aj|). 
Versions of Theorem [3] and Corollary Q] continue to hold for maxi<j< p \Xj\. 
That is, e.g., when Oj are all equal {oj = a), we have i2(maxi<j< p \Xj\, e) < 
4(a p + 1)/ct, where a' p := 'E[maxi<j< p \Xj\/a]. To see this, observe that 
maxi<j< p \Xj\ = maxi<j< 2p Xj where X'- = Xj for j = 1, . . . ,p and X' p+j = 
—Xj for j = l,...,p. Hence we may apply Theorem [3] to X[, . . . , X' 2p to 
obtain the desired conclusion. 

The main feature of Theorem [3] is the fact that it provides qualitative 
bounds on the Levy concentration function £(maxi<j< p Xj, e). In a triv- 
ial example where p = 1, it is immediate to see that P(|-Xi — x\ < e) < 
e^/2/(7raf). A non-trivial case is the situation where p — > oo. In such a 
case, it is typically not known whether maxi< 3 -<p Jj has a limiting distribu- 
tion as p — > oo (recall that except for a_ > 0, we allow for general covariance 
structures between X\, . . . , X p ) and therefore it is not trivial at all whether, 
for every sequence e = e p — > (or at some rate), C(meLXi<j< p Xj, e) — > or 
how fast e = e p — > should be to guarantee that >C(maxi<j< p Xj, e) — > 0. 
Theorem [3] answers this question with explicit, non-asymptotic bounds. 

Comment 5 (Relation to Ball's reverse isoperimetric inequality). Applica- 
tion of Ball's [2| reverse isoperimetric inequality to our problem gives the 
following anti-concentration bound: 

£(max Xj,e) < Cep 1/4 . (6) 

i<i<p 

More precisely, this bound follows from equation (1.4) noted in [3j, which 
is based on [2], and from the fact that the sets of the form A maiX {t) = {x G 
W p : maxi<j< p Xj < t} are convex. Thus, the dimension p appears as p 1 / 4 in 
the bound ([6]). In contrast, our anti-concentration bound has ylV log(p/e) 
instead of p 1//4 , which results in considerably tighter bounds when p is very 
large. Note, however, that Ball's inequality is universal for a broad collection 
A of convex bodies, whereas the anti-concentration inequality developed 
here can be viewed as a reverse isoperimetric inequality for collection of sets 

A max — {^4max(^) : t £ M}. 

The presence of a p on the bounds is essential and can not be removed, 
as the following example suggests. This shows that there does not exist a 
substantially sharper estimate of the universal bound of the concentration 
function than that given in Theorem [3l Potentially, there could be refine- 
ments but they would have to rely on the particular (hence non-universal) 
features of the covariance structure between X\ , . . . , X p . 

Example 1 (Partial converse of Theorem [3]). Let X%, . . . , X p be independent 



standard Gaussian random variables. By Theorem 1.5.3 of 12], as p — > oo 



bJ max Xj - d p ) 4 G(0, 1), (7) 

i<j<p 
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where 



bp : = a/2 log p, dp := b p 



log(47r) + log logp 



2b p 



and G(0, 1) denotes the standard Gumbel distribution, i.e., the distribu- 
tion having the density g(x) = e~ x e~ e x for x G M.. In fact, we can show 
that the density of b p (maxi<j< p Xj — d p ) converges to that of G(0, 1) lo- 
cally uniformly. To see this, we begin with noting that the density of 
bp(maxi<j< p Xj — d p ) is given by 

Op 

where (ft(-) and <£(•) are the density and distribution functions of the standard 
Gaussian distribution, respectively. Pick any x G K. Since, by the weak 
convergence result (|7|), 

[®(d p + b~ 1 x)] p = P(6 P ( max Xj - d p ) < x) -)• e~ e * , p ->■ oo, 



we have [&(d p + b p 1 x)] p 1 — > e e . Hence it remains to show that 

P_ 

b,, 



L p ~ u p 

P ,, , . ,-1 s 

e 



P ^(d p + bZ 1 x) -' 



J p 

Taking the logarithm of the left side yields 

logp - log bp - log(v^) - (dp + bp l xf/2. (8) 

Expanding (d p + b~ x) 2 gives that 

d 2 p + 2d p b~ l x + b~ 2 x 2 = b 2 p - loglogp - log(47r) + 2x + o(l), p -> oo, 

by which we have ([8]) = —a; + o(l). This shows that g p (x) — > g(x) for all 
x G R. Moreover, this convergence takes place locally uniformly in x, i.e., 
for every K > 0, <7 p (x) — > g(x) uniformly in x G [— K, K\. 

On the other hand, the density of maxi<j< p Xj is given by f p (x) = 
p4>(x)[^(x)] p ~ 1 . By this form, for every K > 0, there exist a constant 
c > and a positive integer po depending only on K such that for p > po, 

inf b- l f p (x) = inf a P (x) > inf + o(l) > c, 

which shows that for p > po, 

f p (x) > cb p , Mx G [dp - Kb~ l ,dp + Kb~ l ]. 
Therefore, we conclude that for p > po, 

rdp+e 

P(| max Xj -d p \<e)= / f P (x)dx > 2ceb p , Ve G [0, Kb' 1 ]. 
l <i<P Jd p ~e 

By the Gaussian maximal inequality and Lemma 2.3.15 of [l(J, we have 



A/logp/12 < E[ max X,] < a/2 log p. 

i<j<p 
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Hence, by the previous result, for every K' > 0, there exist a constant c' > 
and and a positive integer p' depending only on K' such that for p > p' Q , 
a p > 1 and 

£( max Xj, e) > P(| max Xj — d„\ < e) > c'ea p , Ve E [0, K'a~ ]. 
i<i<p i<i<p 

□ 



4. Proofs for Section [2] 

4.1. Proof of Theorem [TJ Here for a smooth function / : W — > K, we 
write djf(z) = df(z)/dzj for z = (zi, . . . , z p ) T . We shall use the following 
version of Stein's identity. 

Lemma 2 (Stein's identity). Lei W = (Wi, . . . , W P ) T be a centered Gauss- 
ian random vector in IR P . Let f : W — > M be a ^-function such that 
K[\dj f (W)\] < oo for all 1 < j < p. Then for every 1 < j < p, 

v 

E[Wjf(W)} = J2v[WjW k ]E[d k f(W)}. 

k=l 



Proof ofLemmalE See Section A.6 of [2J; also [8|] and |2J. □ 

We will use the following properties of the smooth max function. 
Lemma 3. For every 1 < j, k < p, 

djFp(z) = TTj(z), d j d k F /3 (z) = /3w jk (z), 



where 



Moreover, 



e^/E P m =i^ Zm , w jk {z) := l(j = fcW*) - nj(z)ir k (z). 



> o, Ej=i^(*) = Ei k =iHk(z)\ < 2 - 

Proof of Lemma\^ The first property was noted in p|. The other properties 
follow from a direct calculation. □ 

Lemma 4. Let m := g o Fg with g £ C 2 (IR). T/ien /or every 1 < j,k < p, 

djd k m(z) = (g" o F /3 )(2:)7r i (z)7r fc (z) + o F j3 )(z)w jk (z), 
where ttj and Wj k are defined in Lemma [3J 

Proof of lemma^ The proof follows from a direct calculation. □ 

Proof of Theorem d Without loss of generality, we may assume that X and 
Y are independent, so that ELXjYfc] = for all 1 < j, k < p. Consider the 
following Slepian interpolation between X and Y: 

Z(t) := yftX + VT^tY, t £ [0, 1]. 
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Let m := g o Fp and := E[m(Z(t))]. Then 

|E[m(X)] -B[m(Y)}\ = |*(1) - *(0)| = / *'(t)di 

Here we have 

*'(*) = ^ E E[a j m(Z( i ))(t- 1/2 X J - (1 - t)" 1 / 2 ^)] 
i=i 
p p 



5 E X>5 " "WWW)]. 



j=l fe=l 



where the second equality follows from applying Lemma[2]to W = (i 1 / 2 Xj- 
(1 - t)- 1 / 2 ^, Z(t) T ) T and f(W) = djm(Z(t)). Hence 



¥'(i)dt 



V[djd k m(Z(t))]dt 



i _ 

j,k=l 

< -max kffc-a^l / ]T |E[0 3 -d fc m(Z(i))]| dt 
= f T E |E[a,Am(Z(t))]|tft. 

70 j,fc=l 



By Lemmas [3] and U 

^ |a^m(Z(t))| < |( 5 "o^)(Z(t))| +2/3|( 5 'oF /3 )(Z(t))|. 
j,k=i 

Therefore, we have 

\E[g(Fp(X)) - g(Fp(Y))}\ 

< A x {- jf 1 E[| o F^)(Z(t))|]dt + /3 jT 1 E[| (</ o Fp)(Z(t))\]dt] (9) 

< A(||^'[| ao /2 + /% / || 00 ), 

which leads to the first assertion. The second assertion follows from the 
inequality ([I]). This completes the proof. □ 



4.2. Proof of Theorem [2], We first note that we may assume that < 
A < 1 since otherwise the proof is trivial (take C > 1 in (J2J)). In what 
follows, let C > be a generic constant that depends only on mini< j< p Ojj 
and maxi<j< p ajj, and its value may change from place to place. For (3 > 0, 
define ep := /3 _1 logp. Consider and fix a C 2 -function go : R — > [0, 1] such 
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that go (t) = 1 for t < and go(t) = for t > 1. For example, we may take 



90 (t) 



0, i>l, 
30//s 2 (l - s) 2 ds, < i < 1, 

1, t<0. 



For given x S R, /3 > and (5 > 0, define g x ,p,s{t) = go{5 1 {t — x — ep)). 

c,/3,(jlloo = ^ Ibolloo and Hff^^l 



For this function g x> p )S , \\g' x p s \\oc = $ l ||fl l olloo and ||#"«Joo = S 2 |b '||oo- 



Moreover, 

l(i < x + ep) < g x ,p,s(t) < l(t < x + ep + 5), Vi € E. (10) 

For arbitrary x £ R, /3 > and 5 > 0, observe that 

P( max ^ < x) < P(F^(X) < x + ep) < E[g xAS (Fp{X))} 
i<?'<p 

< E[g xAS (Fp(Y))] + C(<T 2 + /3<5 _1 )A 

< P(Fp(Y) <x + ep + 5) + C{5~ 2 + 05~ 1 )A 

< P( max K- < x + e« + 5) + C(<T 2 + ^ _1 )A, 

i<j<p 

where the first inequality follows from the inequality (JJ, the second from 
the inequality (|10p . the third from Theorem [TJ the fourth from the in- 
equality (fTUj) . and the last from the inequality (JJ. We wish to compare 
P(maxi<j<p lj < x + ep + 8) with P(maxi<j< p Yj < x), and this is where 
the anti-concentration inequality plays its role. By Corollary [TJ we have 

P( max Yj < x + ep + 5) — P( max Yj < x) 

i<i<p i<i<p 

= P(x < max Yj < x + eg + <5) 

i<j<P 7 

< £( max Y,-, ep + (5) 

i<i<p 



< C(ep + 6)y/lV\og(p/(e p + 5)) 
<C(ep + 5) v / l\/log(p/5). 



Therefore, 



P( max Xj < x) — P( max Yj < x) 

i<j<p i<i<p 



< C{(<T 2 + pr^A + (ep + 6)y/l V log(p/<J)}. (11) 

Choose /3 and 5 in such a way that 

/3 = (TMogp and 5 = A 1/3 (2 logp) 1/6 . 

Recall that p > 2 and < A < 1. Since 5 > A 1 / 3 > A, 1 V log(p/6) < 
21og(p/A). Hence the right side on (HU) is bounded by CA 1 / 3 (log(p/A)) 2 / 3 . 
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For the opposite direction, observe that 

P( max Xj <x)> P(F (X) < x) > E[g x SAS (F (X))] 

l<j<p 

> E[g x ^ SAS (Fp(Y))) - C(5- 2 + /35 -1 )A 

> P(F p (X) <x-8)- C(5- 2 + pS-^A 

> P( max Yj < x — en — 6) — C{5~ 2 + ^ _1 )A. 

i<i<p 

The rest of the proof is similar and hence omitted. □ 

5. Proof of Theorem [3] 

The proof of Theorem [3] uses some properties of Gaussian measures. We 
begin with preparing technical tools. Here let </>(•) and $(•) denote the 
density and distribution functions of the standard Gaussian distribution: 

Mx) = -^e~ x2 / 2 , §(x) = r <t>(t)dt. 



The following two facts were essentially noted in [28|, l29f] (note: [28] and 29] 
did not contain a proof of Lemma EJ which we find non-trivial) . For the 
sake of completeness, we give their proofs after the proof of Theorem El 

Lemma 5. Let Wi,...,W p be (not necessarily independent nor centered) 
Gaussian random variables with unit variance. Suppose that Corr(Wj, Wk) < 
1 whenever j ^ k. Then the distribution of maxi<j< p Wj is absolutely con- 
tinuous with respect to the Lebesgue measure and a version of the density is 
given by 

v 

f{x) = fa) ^ e E[W0]z-(E[W0D 2 /2 . P (w k < x y k -L j | Wj = x) . (12) 
i=i 

Lemma 6. Let Wo, W\, . . . , W p be (not necessarily independent nor cen- 
tered) Gaussian random variables with unit variance. Suppose that E[Wb] > 
0. Then the map 

x ^ e nw ]x-(nw ]f/2 . F ( Wj < x ,i<\/j<p\w = x) (13) 

is non- decreasing on R. 

Let us also recall (a version of) the Gaussian concentration (more pre- 



cisely, deviation) inequality. See, e.g., [13], Theorem 7.1, for its proof. 



Lemma 7. Let X±, . . . ,X p be (not necessarily independent) centered Gauss- 
ian random variables with variance bounded by a 2 > 0. Then for every 
r > 0, 

P( max Xj > E[ max Xj] + r) < e ~ r2 /( 2 ° 2 ). 
i<i<p i<j<p 

We are now in position to prove Theorem [3l 
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Proof of Theorem d The proof consists of three steps. 

Step 1. This step reduces the analysis to the unit variance case. Pick 
any x > 0. Let Wj := (X j -x)/a j +x/a. Then E[Wj] > and Var(W,-) = 1. 
Define Z := maxi<y< p Wj. Then we have 



P(| max X j - x\ < e) < P 
i' r p J 



Xj — x 
max — 



l<j<p Oj 

< supP 



e 

< - 
a 



X,: — x x 

max — 1 y 

i<j<P Oj a 



e 

< - 
a 



sup P [ \Z — y\ < — 



Step 2. This step bounds the density of Z. Without loss of generality, 
we may assume that Corr (Wj, Wk) < 1 whenever j ^ k. Since the marginal 
distribution of Wj is N(/jLj, 1) where fij := E[W}] = (x/a — xjoj) > 0, by 
Lemma [5j Z has density of the form 

f p (z) = 4>(z)G p (z), (14) 

where the map z h-> G p (z) is non-decreasing by Lemma [H Define z := 
(1/ct — l/a)x, so that fj,j < z for every 1 < j < p. Moreover, define 
Z := max 1 <j<p(W^- — fJ-j)- Then 



(u)duGp(z) < J <ft(u)Gp(u)du 

= P(Z > z) 
< P{Z > z-z) 

(z-z-E[Z])\ 



< exp 

where the last inequality is due to the Gaussian concentration inequality 
(Lemma [7]). Note that Wj — [ij = Xj/aj, so that 

E[Z] = E[ max (Xj/aj)} =: a p . 
Therefore, for every z G R, 

^^T^iD'A-'^ 1 ^}- (15) 

Mill's inequality states that for z > 0, 

- 1 - $(z) - z 2 ' 

and in particular (1 + z 2 )/z 2 < 2 when z > 1. Moreover, </>(z)/{l — $(z)} < 
1.53 < 2 on 2 G (— oo, 1). Therefore, 

<f>(z)/{l - $(z)} < 2(z V 1), Vz £ R. 
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Hence we conclude from this, (fl~5l) . and (fbl"|) that 

f P (z) < 2(z V 1) exp j_ ( z -*- Qp) + j , Vz G R. 

Step 3. By Step 2, for every y £M. and i > 0, we have 
ry+t 

P(\Z-y\<t)= f p (z)dz<2t max f p (z) < 4t{z + a p + 1), 
Jy-t «6[y-*,3/+*] 

where the last inequality follows from the fact that the map z i— > ze~( 2 ~ a ) 2 / 2 
(with a > 0) is non-increasing on [a + 1, oo). Combining this inequality with 
Step 1, for every x > and e > 0, we have 

P(| max Xj - x\ < e) < 4e{(l/V - + a p + l}/<7. (16) 

i<j<p 

This inequality also holds for x < by the similar argument, and hence it 
holds for every x G M. 

If a = a = a, then we have 

P(| max Xj - x\ < e) < 4e(a p + 1)/ct, Vx G M, Ve > 0, 

which leads to the first assertion of the theorem. 

On the other hand, consider the case where a < a. Suppose first that 
< e < a. By the Gaussian concentration inequality (Lemma [7]), for \x\ > 
e + a(a p + \J2 log(a/e)), we have 

P(| max Xj — x\ < e) < P( max Xj > |x| — e) 
i<i<p i<i<p 

< P ( max Xj > E[ max Xj] + log(a/e) ) 

Vi^J^P i<j<p / 

< e/a. (17) 



For \x\ < e + a(a p + \J2 log(o;/e)), by and using e < a, we have 



P(| max Xj-x\ < e) < 4e{(a/a)a p + (a/a - 1)a/2 logTo/e) + 2 - a/a}/a. 

i<i<p 

(18) 

Combining (|17p and (|18p . we obtain the inequality in (ii) for < e < a 
(with a suitable choice of C). If e > a, the inequality in (ii) trivially follows 
by taking C > l/er. This completes the proof. □ 

Proof of Lemma\^ Let M := maxi<j< p Wj. The absolute continuity of the 
distribution of M is deduced from the~fact that P(M £ A) < Y%=\ F ( w j G 
A) for every Borel measurable subset A of M. Hence, to show that a version 
of the density of M is given by (|12p . it is enough to show that lim e |o e _1 P(x < 
M < x + e) equals the right side on (fT2|) for a.e. 
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For every x G R and e > 0, observe that 
{x < M < x + e} 

= {Bio, Wi > x and Vi, Wi < x + e} 
= {3ii, x < Wjj < x + e and V« 7^ ii, W{ < x} 
U {3ii, 3i2, x < Wi x < x + e, x < Wj 2 < x + e and Vi ^ {n, ^2}, Wi < x} 

U{Vi,x < Wi < x + e} 
=: A^UA^U-'-UA^. 

Note that the events A^' e , A^, . . . , Ap' e are disjoint. For A^' e , since 
AY = Uf =1 {x < Wi < x + e and Wj < x, Vj / i}, 

where the events on the right side are disjoint, we have 

v 

P(^' e ) = ^ P(x < Wi < x + e and Wj < x, Vj / i) 
i=i 

= [ X+€p ( W 3 <x,yj^i\Wi = u)4>(u - H)du, 
i=i Jx 

where Mi := E[Wi]. We show that for every 1 < i < p and a.e. x € R, 
the map u i-> P(Wj < x,Vj / i | Wj = u) is right continuous at x. Let 
Xj = Wj — Hj so that Xj are standard Gaussian random variables. Then 

P(Wj < x,Vj ^i\Wi = u) = P(Xj <x-fij,Vj^i \ Xi = u- m). 

Pick i = l. Let Vj = Xj — E[XjX\]Xi be the residual from the orthogonal 
projection of Xj on X\. Note that the vector (Vj)2<j< p and X\ are jointly 
Gaussian and uncorrelated, and hence independent, by which we have 

P(Xj < x — fij, 2 < Vj < p I X\ = u — mi) 

= P{Vj <x-Hj- ElXjX^u - m), 2 < Vj < p I Xi = u - mi) 

= P(Vj <x-Hj- EiXjX^u - mi), 2 < Vj < p). 

Define J := {j G {2, . . . ,p} : ELYjXi] < 0} and J c := {2, . . . ,p}\J. Then 

P(Vj <x-Hj- ElXjX^u - mi), 2 < Vj < p) 

-»• P(Vj < Xj, Vj G J, V^/ < Xj/, Vj' G J c ), as u 1 x, 

where — E[XjXi](x — Mi)- Here each Vj either degenerates to 

(which occurs only when Xj and X\ are perfectly negatively correlated, i.e., 
E[XjXi] = —1) or has a non-degenerate Gaussian distribution, and hence 
for every x G R expect for at most (p — 1) points (mi + Mj)/2, 2 < j < p, 

P(Vj < Xj, Vj G J, Vj, < Xj/, Vj' G J c ) = P(Vj < Xj, 2 < Vj < p) 

= P(Wj < x, 2 < Vj < p I Wi = x). 



18 



CHERNOZHUKOV, CHETVERIKOV, AND KATO 



Hence for i = 1 and a.e. the map u i— > P(Wi < , V j 7^ i | W{ = u) 

is right continuous at x. The same conclusion clearly holds for 2 < i < p. 
Therefore, we conclude that, for a.e. x £ R, as e i 0, 

1 p 

-P(Al' e ) -> 2P(W f <x ) VjV»|Wi = x)0(x-Ai i ) 

i=l 

P 

= ^e MlX ~^ /2 P(^ < x,Vj ^ < I = k). 

i=l 

In the rest of the proof, we show that, for every 2 < i < p and x £ R, 
P(A^' e ) = o(e) as e J, 0, which leads to the desired conclusion. Fix any 
2 < i < p. The probability P(A^' e ) is bounded by a sum of terms of 
the form P(x < Wj < x + e,x < Wk < x + e) with j / k. Recall that 
Corr(Wj,Wk) < 1. Assume that Corr(W}, W^) = —1. Then for every 
x 6 R, P(x < < x + e, 3; < Wfc < x + e) is zero for sufficiently small 
e. Otherwise, (W}, Wk) T obeys a two-dimensional, non-degenerate Gaussian 
distribution and hence P(x < Wj < x + e, x < Wk < x + e) = 0(e 2 ) = o(e) 
as e for every x € R. This completes the proof. □ 

Proof of Lemma® Since E[Wq] > 0, the map x h-> exp(E[Wo]x - (E[W ]) 2 ) 
is non-decreasing. Thus it suffices to show that the map 

m P(Wi < x,... ,Wp < x I W = x) (19) 

is non-decreasing. As in the proof of Lemma let Xj = Wj — E[Wj] and 
let Vj = Xj — ~E[XjXo]Xo be the residual from the orthogonal projection of 
Xj on Xq. Note that the vector {Vj)i<j< p and Xq are independent. Hence 
the probability in (fT9j) equals 

P(Vj <x-fij- E[X i X ](x - E[W ]), 1 < Vj < p I A = x - E[W„]) 
= P(VJ < x - H - E[XjX ](x - E[W ]), 1 < Vj < p), 

where the latter is non-decreasing in x on R since E[AjA~o] < 1. □ 

Appendix A. Proof of Lemma [1] 

Lemma [T] follows from the following maximal inequality and Holder's in- 
equality. Here we write a < b if a is smaller than or equal to b up to a 
universal positive constant. 

Lemma 8. Let Z\, . . . , Z n be independent random vectors in R p with p > 2. 
Define M := maxi<j< n maxi<j< p \Zij\ and a 2 := maxi<j<p YjZ=i ^[^ijl- 
Then 

E[max EILlC^i -E[2tf])|] < (<Vbg^ + ^/Ep^log^). 

i<?<p 

We shall use the following lemma. 
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Lemma 9. Let V%, . . . , V n be independent random vectors in MP with p > 2 
such that Vij > for all 1 < i < n and 1 < j < p. Then 

E [max EiLi^j] - EELi^j] + E[ max max V^-] log p. 

Proof of Lemma We make use of the symmetrization technique. Let 
£i,...,e n be independent Rademacher random variables (i.e., P(e$ = 1) = 
p( £i = -1) = 1/2) independent of VJ* := {Vi, . . . , K}- Then by the triangle 
inequality and Lemma 2.3.1 in [25| . 

J := E[ max £? =1 Vtf] < ^ EE? =1 ^] + Ef max |E?=i(^i " E[T^])|] 

< ™ nZ?=iVij] + 2E[max |E? = l^ll- 

i<j<p i<?<p 



By Lemmas 2.2.2 and 2.2.7 in [25|, we have 

E[max |E^i^| | ] < max (E^i^) 1/a V^8P 
i<j<p i<j<p 

< V^bg^ maoc^t!^) 1 / 2 , 
i<j<p 

where -B := maxi<j< n maxi<j< p V^-. Hence by Fubini's theorem and the 
Cauchy-Schwarz inequality, 

E[max |E? =1 £i^-|] < VE[B]logp(E[max E?=i^']) 1/2 

1<?<P 1<7<P 

= v/EfB] logpv 7 /. 

Therefore, we have 

/ < max E[£X =1 Vy + VE[5] logpv 7 / =: a + 6V 7 /. 

i<i<p 

Solving this inequality, we conclude that I < a + 6 2 . □ 

Proof of Lemma\^ Let ex, . . . ,e n be independent Rademacher random vari- 
ables independent of Z\, . . . ,Z n . Then arguing as in the previous proof, we 
have 

E[max |Etl(%-E[^i])|] < 2E[max \T,tl^ Z v\] 
i<j<p i<j<p 

^ELmax^i^O^V^ 



< (E[max EtiZ?j]) 1/2 V^gP- (Jensen) 



l<j<P~— * 4J 
By Lemma applied to 14,- = Zf- , we have 

E[ max EILi4] Z° 3 + E[M 2 ] log p. 

This implies the desired conclusion. □ 
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